Sequencing and Raw Sequence Data Quality Control ◾ 9
applications like the de novo genome assembly, variant discovery, and epigenetics. Usually,
genomes have long repeated sequences ranging from hundreds to thousands of bases that
are hard to cover with short reads produced by the NGS. The foundation for TGS was
emerged in 2003, when the DNA polymerase was used to obtain a sequence of 5 bp from
a single DNA molecule by using fluorescent microscopy [4]. The single-molecule sequenc-
ing (SMS) then evolved to include (i) direct imaging of individual DNA molecules using
advanced microscopy techniques and (ii) nanopore sequencing technologies in which a
single molecule of DNA is threaded through a nanopore and molecule bases are detected
as they pass through the nanopore. Although TGS provides long reads (from a few hun-
dred to thousands of base pairs), that may come at the expense of the accuracy. However,
lately, the accuracy of the TGS has been greatly improved. The TGS provides long reads
that can enhance de novo assembly and enable direct detection of haplotypes and higher
consensus accuracy for better variant discovery. In general, there are two TGS technolo-
gies that are currently available: (i) Pacific Bioscience (PacBio) single-molecule real-time
(SMRT) sequencing and (ii) Oxford Nanopore Technologies (ONTs).
1.2.3.1 PacBio Technology
The Pacific Biosciences (PacBio) sequencing can provide long reads that range between
500 and 50,000 bp. The PacBio sequencing has been improved since it made debut in 2011.
The underlying technology of the PacBio is based on the SMRT sequencing, in which a
single DNA molecule is sequenced and the base calling is given in the real time, while the
sequencing is in progress [5]. The sequencing steps include fragmentation and ligation of
adaptors to the DNA template for library generation. Special loop adaptors are ligated to
the ssDNA produced from the double-stranded DNA (dsDNA). The loop adaptors link
both strands forming structures called linear DNA SMRTbells. The sequencing takes place
on nano wells on a flow cell. The nano wells are made of silicon dioxide chips called zero-
mode waveguides (ZMWs) [6]. A cell contains thousands of ZMWs. A ZMW is around
70 nm in diameter and 100 nm in depth, and it allows laser light to come through the bot-
tom to excite the fluorescent dye. A DNA polymerase is attached to the bottom of the nano
well. When a DNA single fragment is added to the well, the DNA polymerase is attached
to it. The polymerase has the strand displacement capability that converts the DNA
SMRTbells into circular structure called circular DNA SMRTbell (Figure 1.5). Then, the
DNA polymerase continues adding nucleotides to form a complementary strand for both
forward and reverse strands. PacBio uses SBS approach. Four fluorescently labeled nucleo-
tides (dNTPs) are added to the reactions. The dNTPs are fluorescently labeled by attach-
ing the fluorescent dyes to phosphate chain of the nucleotides. Each time a fluorescently
labeled nucleotide is incorporated, a fluorescent dye is cleaved from the growing nucleic
acid chain before the next nucleotide is added. The fluorescence is then excited by the light
coming through the bottom of the well and detected in the real time. The real-time identi-
fication of the incorporated labeled nucleotides allows the base call. This process of adding
nucleotides and fluorescence detection continues until the entire fragment is sequenced.
This pass can be repeated different times to generate more accurate reads by the circular
consensus sequences (CCS). Ten passes produce reads with 99.9% accuracy. These reads are